Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug where fvh fragments could be loaded from wrong doc #65641

Merged
merged 6 commits into from
Dec 9, 2020

Conversation

jtibshirani
Copy link
Contributor

This PR fixes a regression where fvh fragments could be loaded from the wrong
document _source.

Some FragmentsBuilder implementations contain a SourceLookup to load from
_source. The lookup should be positioned to load from the current hit document.
However, since FragmentsBuilder are cached and shared across hits, the lookup
is never updated to load from the new documents. This means we accidentally
load _source from a different document.

The regression was introduced in #60179, which started storing SourceLookup
on FragmentsBuilder.

Fixes #65533.

@@ -160,8 +160,13 @@ public HighlightField highlight(FieldHighlightContext fieldContext) throws IOExc
}
cache.fvh.setPhraseLimit(field.fieldOptions().phraseLimit());

String[] fragments;
// If the fragments builder requires document _source, pass it the source lookup from the hit context.
if (entry.fragmentsBuilder instanceof SourceBasedFragmentsBuilder) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hacky but felt like the best way to fix this right now. FragmentsBuilder is an external (Lucene) interface and there's not a good way to pass in _source to its methods like getFields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we replace the cached FragmentsBuilder with a Function<SourceLookup, FragmentsBuilder> and then call that with the new SourceLookup for each hit? The FragmentsBuilders themselves don't seem to do any significant processing on construction so I don't think there would be a performance penalty for rebuilding them each time.

Copy link
Contributor Author

@jtibshirani jtibshirani Dec 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like that should work! I'd like to check I understand the context. We have these cached entries not because fragment builders are expensive to build, but to avoid rebuilding the FieldQuery which looks more complex?

@jtibshirani jtibshirani force-pushed the fvh branch 3 times, most recently from a4b4af4 to ab7abb6 Compare December 1, 2020 00:46
@jtibshirani jtibshirani marked this pull request as ready for review December 1, 2020 05:56
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Dec 1, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@jtibshirani
Copy link
Contributor Author

Thanks @romseygeek for the suggestion, I tried out the new approach.

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @jtibshirani!

@jtibshirani jtibshirani merged commit ddf1f4c into elastic:master Dec 9, 2020
@jtibshirani jtibshirani deleted the fvh branch December 9, 2020 22:47
jtibshirani added a commit to jtibshirani/elasticsearch that referenced this pull request Dec 9, 2020
…5641)

This PR fixes a regression where fvh fragments could be loaded from the wrong
document _source.

Some `FragmentsBuilder` implementations contain a `SourceLookup` to load from
_source. The lookup should be positioned to load from the current hit document.
However, since `FragmentsBuilder` are cached and shared across hits, the lookup
is never updated to load from the new documents. This means we accidentally
load _source from a different document.

The regression was introduced in elastic#60179, which started storing `SourceLookup`
on `FragmentsBuilder`.

Fixes elastic#65533.
jtibshirani added a commit to jtibshirani/elasticsearch that referenced this pull request Dec 10, 2020
…5641)

This PR fixes a regression where fvh fragments could be loaded from the wrong
document _source.

Some `FragmentsBuilder` implementations contain a `SourceLookup` to load from
_source. The lookup should be positioned to load from the current hit document.
However, since `FragmentsBuilder` are cached and shared across hits, the lookup
is never updated to load from the new documents. This means we accidentally
load _source from a different document.

The regression was introduced in elastic#60179, which started storing `SourceLookup`
on `FragmentsBuilder`.

Fixes elastic#65533.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FVH Highlighter does not work with the inner hits properly
4 participants